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Urhamnose (IrRha) is a deoxy-hexose sugar commonly found in nature. LRha catabolic 
patliways were previously characterized in various bacteria including Escherichia coli. 
Nevertheless, homology searches failed to recognize all the genes for the complete 
IrRha utilization pathways in diverse microbial species involved in biomass decomposition. 
Moreover, the regulatory mechanisms of L-Rha catabolism have remained unclear in 
most species. A comparative genomics approach was used to reconstruct the L-Rha 
catabolic pathways and transcriptional regulons in the phyla Actinobacteria, Bacteroidetes, 
Chloroflexi, Firmicutes, Proteobacteria, and Thermotogae. The reconstructed pathways 
include multiple novel enzymes and transporters involved in the utilization of L-Rha and 
LRha-containing polymers. Large-scale regulon inference using bioinformatics revealed 
remarkable variations in transcriptional regulators for LRha utilization genes among 
bacteria. A novel bifunctional enzyme, Lrhamnulose-phosphate aldolase (RhaE) fused 
to L-lactaldehyde dehydrogenase (RhaW), which is not homologous to previously 
characterized LRha catabolic enzymes, was identified in diverse bacteria including 
Chloroflexi, Bacilli, and Alphaproteobacteria. By using in vitro biochemical assays we 
validated both enzymatic activities of the purified recombinant RhaEW proteins from 
Chloroflexus aurantiacus and Bacillus subtilis. Another novel enzyme of the L-Rha 
catabolism, L-lactaldehyde reductase (RhaZ), was identified in Gammaproteobacteria and 
experimentally validated by in vitro enzymatic assays using the recombinant protein 
from Salnnonella typhinnurium. C. aurantiacus induced transcription of the predicted L-Rha 
utilization genes when L-Rha was present in the growth medium and consumed L-Rha 
from the medium. This study provided comprehensive insights to LRha catabolism and its 
regulation in diverse Bacteria. 
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INTRODUCTION 

L-rhamnose (L-Rha) is a deoxy-hexose sugar commonly found 
in plants as a part of complex pectin polysaccharides and in 
many bacteria as a common component of the cell wall (Buttke 
and Ingram, 1975; Giraud and Naismith, 2000). Many microor- 
ganisms including the Enterobacteriaceae and Rhizobiaceae are 
capable of utilizing L-Rha as a carbon source (Eagon, 1961). 
Plant-pathogenic species (such as Erwinia spp.) and saprophytic 
species (e.g.. Bacillus subtilis) are able to degrade rhamnogalac- 
turonans and other L-Rha-containing polysaccharides by a set 
of extracellular enzymes including rhamnogalacturonate lyases 
(termed RhiE in Erwinia spp.) and a-L-rhamnosidases (RhmA, 
RamA) (Laatu and Condemine, 2003; Ochiai et al., 2007; Avila 
et al., 2009). The resulting L-Rha and unsaturated rhamno- 
galacturonides can enter the cells by specific transport systems, 
the L-rhamnose permease RhaT in Enterobacteriaceae (Muiry 



et al., 1993), and the RhiT transporter in Erwinia chrysanthemi 
(Hugouvieux-Cotte-Pattat, 2004). In the latter species, the unsat- 
urated galacturonyl hydrolase RhiN is used to release L-Rha 
and unsaturated galacturonate residues to promote their further 
catabolism in the cytoplasm (Hugouvieux-Cotte-Pattat, 2004; 
Rodionov et aL, 2004). 

The canonical phosphorylated catabolic pathway for L-Rha 
described in enterobacteria is comprised of three enzymes, L- 
Rha isomerase (RhaA), L-rhamnulose kinase (RhaB), and L- 
rhamnulose-1 -phosphate aldolase (RhaD) (Schwartz etal., 1974), 
which convert L-Rha to dihydroxyacetone phosphate (DHAP) 
and L-lactaldehyde (Akhy et al., 1984; Badia et al., 1989) 
(Figure 1). In addition, L-Rha mutarotase (RhaM) facilitates 
the interconversion of a and p anomers of L-Rha, providing 
the stereochemicaUy less-favored anomer for the subsequent 
catabolic reactions (Richardson et al., 2008). The structures 



www.frontiersin.org 



December 2013 | Volume 4 | Article 407 | 1 



Rodionova et al. 



Catabolism of rhamnose in bacteria 






Rhamnose 
oligosaccharides 




Rhamnogalacturonides 



galacturonate hySrSial?? 





RhaB 


kinase 




RhaK 



ATP ADP 




FIGURE 1 I Reconstruction of the L-rhamnose utilization pathways in 
bacteria. Solid gray arrows indicate enzymatic reactions, and broken 
arrows denote transport. Enzyme classes and families of transporters 
are shown in blue subscript. Multiple non-orthologous variants of 



proteins for several functional roles are highlighted by the same 
background color. Tentatively predicted functional roles are marked by 
asterisks. Components of a pathway variant present in C. aurantiacus 
are shown in red boxes. 



and reaction mechanisms each of these four enzymes from 
Escherichia coli have been determined (Korndorfer et al., 2000; 
Kroemer et al., 2003; Ryu et al, 2005; Grueninger and Schulz, 
2006). Another L-Rha isomerase with broad substrate speci- 
ficity (Rhal, 17% sequence identity to RhaA from E. coli) 
has been characterized in Pseudomonas stutzeri (Leang et al, 
2004; Yoshida et al., 2007). L-lactaldehyde is a common prod- 
uct of both the L-rhamnose and L-fucose catabolic pathways 
and is further metabolized to L-lactate by the aldehyde dehy- 
drogenase AldA or to 1,2-propanediol by the lactaldehyde 
reductase RhaO/FucO under certain conditions (Baldoma and 
Aguilar, 1988; Zhu and Lin, 1989; Patel et al, 2008). An alter- 
native nonphosphorylated catabolic pathway for L-Rha com- 
prising four metabolic enzymes L-rhamnose- 1 -dehydrogenase, 
L-rhamnono-y-lactonase, L-rhamnonate dehydratase and L-2- 
keto-3-deoxyrhamnonate aldolase, by which L-Rha is converted 
to pyruvate and L-lactaldehyde, have been identified in fungi and 
two bacterial species, Azotobacter vinelandii and Sphingomonas sp. 
(Watanabe et al, 2008; Watanabe and Makino, 2009). 

Induction of the L-Rha utilization genes in E. coli is medi- 
ated by two rhamnose-responsive positive transcription factors 
(TFs) from the AraC family, RhaS, and RhaR (Tobin and Schleif, 
1990; Egan and Schleif, 1993; Via et al, 1996). RhaR activates 
the rhaSR genes via binding to the inverted repeat of two 17 



bp half sites separated by a 17 bp spacer. RhaS activates the 
rhaBAD and rhaT genes via binding to another inverted repeat 
of two sites whose sequence differs from the RhaR consensus 
binding site. In another bacterium, the plant pathogen Erwinia 
chrysanthemi from the order Enterobacteriales, the expanded RhaS 
regulon includes a similar set of genes involved in L-Rha uti- 
lization, as well as the rhamnogalacturonides utilization genes 
rhiTN (Hugouvieux-Cotte-Pattat, 2004). The L-Rha catabolic 
gene cluster in Bacteroides thetaiotaomicron is positively con- 
trolled by another AraC-family TF, which is non-orthologous to 
E. coli RhaR (16% identity) (Patel et al, 2008). In Rhizobium legu- 
minosarum bv. trifolii, a novel negative TF of the DeoR family 
has been implicated in control of the L-Rha utilization regulon, 
which contains two divergently transcribed operons, rhaRST- 
PQUK and rhaDI, encoding an ABC transporter for L-Rha uptake 
(RhaSTPQ), an alternative kinase (RhaK, 19% identity to RhaB 
from E. coli), an isomerase (Rhal), and a mutarotase (RhaU, 41% 
identity to RhaM from E. coli) (Richardson et al, 2004, 2008; 
Richardson and Oresnik, 2007). 

Our initial genome analysis suggested the presence of a novel 
variant of the L-Rha utilization pathway in anoxygenic pho- 
totrophic bacteria from the CMoroflexi phylum. Indeed, the exis- 
tence of such pathway was implicated by the presence of rhaA 
and rhaB gene orthologs and the absence of rhaD and rhaO 
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genes in Chloroflexus aurantiacus. Moreover, the L-Rha catabolic 
pathway is not completely understood in many more bacterial 
species including Bacillus subtilis, and Streptomyces coelicolor. 
Mechanisms of transcriptional regulation of L-Rha utilization 
genes are also poorly understood in many species beyond the 
models. With the availability of hundreds of sequenced bac- 
terial genomes, it is possible to use comparative genomics to 
reconstruct metabolic pathways and regulatory networks in indi- 
vidual taxonomic groups of Bacteria (Rodionov et al., 2010, 2011; 
Ravcheev et al., 2011, 2013; Leyn et al., 2013). Genome context- 
based techniques, including the analysis of chromosomal gene 
clustering, protein fusion events, phylogenetic co-occurrence 
profiles, and the genomic inference of metabolic regulons, are 
highly efficient methods for elucidation of novel sugar catabolic 
pathways. In our previous studies, we combined the genomic 
reconstruction of metabolic and regulatory networks with exper- 
imental testing of selected bioinformatic predictions to map 
sugar catabolic pathways systematically in two diverse taxo- 
nomic groups of bacteria, Shewanella, and Thermotoga (Rodionov 
et al., 2010, 2013). Furthermore, we have applied the inte- 
grated bioinformatic and experimental approaches to predict and 
validate novel metabolic pathways and transcriptional regulons 
involved in utilization of arabinose (Zhang et al., 2012), xylose 
(Gu et al., 2010), N-acetylglucosamine (Yang et al, 2006), N- 
acetylgalactosamine (Leyn et al., 2012), galacturonate (Rodionova 
et al., 2012a), and inositol (Rodionova et al, 2013) in diverse 
bacterial lineages. 

In this work, we combined genomics-based reconstruction 
of L-Rha utilization pathways and RhaR transcriptional reg- 
ulons in bacteria from diverse taxonomic lineages with the 
experimental validation of the L-Rha utilization system in C. 
aurantiacus and two other microorganisms. A novel bifunc- 
tional enzyme (named RhaEW) catalyzing two consecutive steps 
in L-Rha catabolism, L-rhamnulose-phosphate aldolase and L- 
lactaldehyde dehydrogenase, was identified in diverse bacte- 
rial lineages including Actinobacteria, a-proteobacteria. Bacilli, 
Bacteroidetes, and Chloroflexi. The predicted dual function of 
RhaEW was validated by in vitro enzymatic assays with recombi- 
nant proteins from C. aurantiacus and B. subtilis. Another enzyme 
involved in L-lactaldehyde utilization in y-proteobacteria, L- 
lactaldehyde reductase RhaZ, was identified and experimentally 
confirmed in Salmonella spp. Comparative analyses of upstream 
regions of the L-Rha utilization genes allowed identification of 
candidate DNA motifs for various groups of regulators from 
different TF families and reconstruction of putative rhamnose 
regulons. L-Rha-specific transcriptional induction and the pre- 
dicted DNA binding motif of a novel DeoR-family regulator for 
of the rha genes were experimentally confirmed in C. aurantiacus. 

MATERIALS AND METHODS 

GENOMIC RECONSTRUCTION OF RHAMNOSE UTILIZATION PATHWAYS 
AND REGULONS 

The comparative genomic analysis of L-Rha utilization subsystem 
was performed using the SEED genomic platform (Overbeek 
et al., 2005), which allowed annotation and capture of gene 
functional roles, their assignment to metabolic subsystems, 
identification of non-orthologous gene displacements, and 



projection of the functional annotations across microbial 
genomes, as it was previously described for other sugar catabolic 
subsystems (Rodionov et al., 2010, 2013; Leyn et al, 2012; 
Rodionova et al, 2012a, 2013). The obtained functional gene 
annotations were captured in the SEED subsystem available 
online at http://pubseed.theseed.org/SubsysEditor.cgi?page= 
ShowSubsystem&subsystem=L-rhamnose_utilization and are 
summarized in Table SI in the Supplementary Material. 

For reconstruction of RhaR regulons we used an estab- 
lished comparative genomics approach based on identification 
of candidate regulator-binding sites in closely related bacte- 
rial genomes implemented in the RegPredict Web server tool 
(regpredict.lbl.gov) (Novichkov et al., 2010). First, we identi- 
fied potential rhaR transcription factor genes that are located 
within the conserved neighborhoods of the L-Rha catabolic 
genes in bacterial genomes from each studied taxonomic lin- 
eage. Identification of orthologs in closely related genomes and 
gene neighborhood analysis were performed in MicrobesOnline 
(http://microbesonline.org/) (Dehal et al., 2010). To find the 
conserved DNA-binding motifs for each group of orthologous 
RhaR regulators, we used initial training sets of genes that 
are co-localized with rhaR orthologs (putative operons contain- 
ing at least one candidate L-Rha utilization gene and that are 
located in the vicinity of a maximum ten genes from a rhaR 
gene), and then we updated each set by the most likely RhaR- 
regulated genes confirmed by the comparative genomics tests 
as well as functional considerations (i.e., involvement of candi- 
date target genes in the L-Rha utilization pathway). Using the 
Discover Profile procedure in RegPredict, common DNA motifs 
with palindromic or direct repeat symmetry were identified and 
their corresponding position weight matrices (PWMs) were con- 
structed. The initial PWMs were used to scan the reference 
genomes and identify additional RhaR-regulated genes that share 
similar binding sites in their upstream regions. The conserved 
regulatory interactions were included in the reconstructed RhaR 
regulons using the clusters of co-regulated orthologous operons 
in RegPredict. Candidate sites associated with new members of 
the regulon were added to the training set, and the respective 
lineage-specific PWM was rebuilt to improve search accuracy. 
Sequence logos for the derived DNA-binding motifs were built 
using the Weblogo package (Crooks et al., 2004). The details of all 
reconstructed regulons are displayed in the RegPrecise database of 
regulons (Novichkov et al., 2013) available online at http://regpre 
cise.lbl.gov/RegPrecise/collection_pathway.jsp?pathway_id=34. 

GENE CLONING AND PROTEIN PURIFICATION 

The rhaEW (Caur_2283) and rhaR (Caur_2290) genes from 
C. aurantiacus J-lO-fl, the rhaEW (yuxG) gene from B. sub- 
tilis, and the rhaZ {STM4044) and rhaD (STM4045) genes from 
Salmonella enterica serovar Typhimurium LT2 were amplified by 
PGR from genomic DNA using specific primer pairs (see Table 
S2 in Supplementary Material). A pET-derived vector, pODG29 
Gerdes et al. (2006), containing a T7 promoter and an N-terminal 
Hisg tag, or a similar vector, pProEX HTb (Invitrogen), with a trc 
promoter was used for cloning and protein expression. The rhaR 
gene was cloned into the pSMT3 expression vector (Mossessova 
and Lima, 2000) (a kind gift of Dr. Lima from Cornell University). 
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The obtained plasmid encodes a fusion between the RhaR 
protein and an N-terminal Hexa-histidine Smt3 polypeptide 
(a yeast SUMO ortholog), which enhances protein solubility. 
The resulting plasmids were transformed into E. coli BL21/DE3 
or BL21 (Gibco-BRL, Rockville, MD). Recombinant proteins 
were overexpressed as fusions with an N-terminal Hisetag and 
purified to homogeneity using Ni^+ -chelation chromatography. 
Cells were grown in LB medium (50 ml), induced by addition 
of 0.2 mM isopropyl-P-D-thiogalactopyranoside, and harvested 
after 4h of additional shaking at 37°C (for Caur_2283, and 
Caur_2290) or 16 h of shaking at 25°C (for YuxG, STM4044, and 
STM4045). Harvested cells were resuspended in 20 mM HEPES 
buffer (pH 7) containing 100 mM NaCl, 0.03% Brij-35, 2mM 
P-mercaptoethanol, and 2mM phenylmethylsulfonyl fluoride 
(Sigma- Aldrich). Cells were lysed by incubation with lysozyme 
(1 mg/ml) for 30min, followed by a freeze-thaw cycle and soni- 
cation. After centrifugation, Tris-HCl buffer (pH 8) was added to 
the supernatant (50 mM, final concentration), which was loaded 
onto Ni-nitrilotriacetic acid (NTA) agarose minicolumn (0.3 ml) 
from Qiagen Inc. (Valencia, CA). After washing with starting 
buffer containing 1 M NaCl and 0.3% Brij-35 bound proteins 
were eluted with 0.3 ml of the same buffer supplemented with 
250 mM imidazole. The purified proteins were electrophoresed 
on a 12% (w/v) sodium dodecyl sulfate-polyacrylamide gel to 
monitor size and purity (>90%). Protein concentration was 
determined by the Quick Start Bradford Protein Assay kit from 
Bio-Rad. 

ENZYME ASSAYS 

Aldolase/dehydrogenase activities of the purified recombinant 
RhaEW proteins from C. aurantiacus (Cci_RhaEW) and B. sub- 
tilis (Bs_RhaEW), and the Sf_RhaD and St_RhaZ proteins from 
Salmonella typhimurium were tested by a direct NADH detec- 
tion assay. Because L-rhamnulose-l-P is not commercially avail- 
able, we used an enzymatic coupling assay with two upstream 
catabolic enzymes for the conversion of L-Rha to L-rhamnulose- 
l-P. The L-Rha isomerase RhaA from E. coli (_Ec_RhaA) and the L- 
rhamnulose kinase RhaB from Thermotoga maritima (rm_RhaB) 
were expressed in E. coli and purified as described previously 
(Rodionova et al, 2012b). For Cfl_RhaEW assays, the purified 
recombinant enzymes, i;c_RhaA (2 |xg) and rm_RhaB (2 (xg), 
were pre-incubated during 20min at 37°C in 100 |xl of reaction 
mkture containing 150mM Tris-HCl (pH 8), 20 mM MgCU, 
10 mM ATP, 1.4 mM NAD+, 10(jlM ZnS04, and 8mM L-Rha. 
Then Cfl_RhaEW (0.5 (Ag) was added to the assay mixture and 
the reduction of NAD+ was followed by increase in absorbance at 
340 nm at different temperatures (30-70°C) in the spectropho- 
tometer. For Bs_RhaEW, Sf_RhaD and Sf_RhaZ assays, £c_RhaA 
and rm_RhaB were pre-incubated in a ratio of 40:1 (RhaA:RhaB) 
at 25° C for 40 min in a reaction mixture containing 50 mM Tris- 
HCl (pH 7.5), 20 mM MgCh, 1 mM ATP 50 mM KCl, 2.5 mM 
NAD+ or 0.25 mM NADH, 5 |xM ZnCh, and 2mM L-Rha. 
Subsequently, either Bs_RhaEW (2.8 [Ag) or Sf_RhaD (lOjjtg) 
and St_RhaZ (2.8 [ig) enzymes were added and the reduction 
of NAD+ or oxidation of NADH was monitored by increase or 
decrease in absorbance at 340 nm, respectively, at 25° C in a final 
reaction volume of 200 |xl. 



GC-MS ANALYSIS 

Four-step biochemical conversions of L-Rha to L-lactate and 
DHAP by mixtures of the three L-Rha catabolic enzymes were 
monitored by GC-MS. Samples from enzymatic assay mixtures 
(10 (jlI) were dried in a vacuum centrifuge at room temperature, 
and derivatized at 80°C for 20 min with 75 [lI of pyridine con- 
taining 50 mg ml^' methoxylamine or ethylhydroxylamine (for 
lactate detection). The solution was incubated at 80° C for 60 min 
with 75 |xl of N,0-fc!s-(trimethylsilyl)trifluoroacetamide or N- 
terf-butyldimethylsilyl-N-methyltrifluoroacetamide (for lactate 
detection). After derivatization, the samples were centrifuged for 
1 min at 14,000 r.p.m. and the supernatant (1 [xl) was transferred 
to vials for GC-MS analysis. A QP2010 Plus GC-MS instrument 
was from Shimadzu (Columbia, MD). GC-MS analyses were 
performed as previously described in Rodionova et al. (2012a, 
2013). 

BACTERIAL STRAINS AND GROWTH CONDITIONS 

The yuxG{rhaEW) and ycel(niaP) disruption strains ofi?. subtilis 
were obtained from the joint Japanese and European B. sub- 
tilis consortium (Kobayashi et al, 2003). The latter strain with 
an insertion in the niacin transporter niaP was used as an iso- 
genic negative control. Both strains were grown overnight at 
37°C in chemically defined medium containing D-glucose (4 g/1), 
L-tryptophan (50mg/l), L-glutamine (2 g/1), K2HPO4 (10 g/1), 
KH2PO4 (6 g/1), sodium citrate (lg/1), MgS04 (0.2 g/1), K2SO4 
(2 g/1), FeCls (4mg/l), and MnS04 (0.2mg/l) in the presence 
of erythromycin (0.5mg/l) (pMUTIN2 marker). Overnight cul- 
tures were diluted ~ 10-fold to yield the same cell density (optical 
density at 600 nm of 0.05) in the defined medium lacking glu- 
cose and washed three times to remove residual glucose. Cells 
were grown in triplicate in one of two versions of the defined 
medium containing L-Rha (4 g/1), or no additional carbon source. 
C. aurantiacus J-lO-fl was grown at 52°C in 25 ml screw capped 
glas tubes completely filled with BG-11 medium (Stanier et al, 
1971) supplemented with 0.02% (w/v) of NH4CI and 2mM of 
NaHCOs. 0.2% of yeast extract (YE) or 35 mM of pyruvate, both 
with and without additional 20 mM L-Rha, were used as main 
carbon source and cultures grown under microaerobic starting 
conditions in the light. Cultures were constantly mixed on a rota- 
tion wheel during incubation. Growth of cultures was monitored 
at 600 nm using a ELX-808IU microplate reader from BioTek 
Instruments Inc. (Winooski, VT). The concentration of L-Rha in 
culture fluids was determined on an HPLC equipped with an HPX 
78 (Bio-Rad) column. 

RT-PCR 

Individual transcript levels were measured for seven genes from 
C. aurantiacus: rhaB {Caur_2282), rhaF {Caur_2286), rhaR 
iCaur_2290), rhniA iCaur_0361), and Caur_0839 (NADH-flavin 
oxidoreductase/NADH oxidase). The latter housekeeping gene 
was used as a positive control since it was found to be highly 
expressed under both photoheterotrophic as well as chemo- 
heterotrophic conditions in a previous proteome study (Cao 
et al., 2012). Total RNA was isolated from cells grown on BG-11 
medium supplied with YE, YE plus L-Rha, pyruvate, and pyruvate 
plus L-Rha under suboxic conditions in the light, and collected 
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after 3 days at optical densities at 650 nm of 1.3, 0.9, 0.4, and 0.6, 
respectively. RNA was isolated using a phenol-chloroform extrac- 
tion method adapted from (Aiba et al, 1981; Steunou et al., 2006). 
Cell pellets were resuspended in 250 |xl 10 mM sodium acetate 
(pH 4.5) and 37.5 [i\ 500 mM Na2EDTA (pH 8.0), then mixed 
with 375 [xl Lysis buffer (10 mM sodium acetate, 2% SDS, pH 
to 4.5). Hot (65°C) acidic (pH 4.5) phenol (700 (xl) was added, 
the sample was vortexed and incubated at 65° C for 3 min. After 
centrifiigation (17,000 x g, 2 min), the RNA was further puri- 
fied by one phenol-chloroform-isoamyl alcohol (25:24:1) and one 
chloroform extraction. RNA was precipitated using 0.1 volume of 
10 M LiCl and 2.5 volume 100% EtOH and precipitated at -80°C 
for at least 30 min, washed with 80% EtOH and resuspended in 
DEPC treated H2O. The RNA solution was treated with DNase 
I (New England Biolab Inc.) and re-precipitated after an addi- 
tional chloroform:isoamyl alcohol (24:1) extraction. The purified 
RNA was dissolved in DEPC-treated water. Semi-quantitative RT- 
PCR was conducted using a Bioline Tetro one-step RT-PCR kit 
following the manufacturer's protocol. The gene-specific primers 
for each gene tested are shown in Table S2 in Supplementary 
Material. For each reaction one control for DNA contamina- 
tion was included (same template as for RT-PCR, started with 
inactivation of RT-Polymerase step) and a PCR positive control 
(using 10 ng whole genome DNA from C. aurantiacus as tem- 
plate) was used. PCR conditions were the same for each primer 
pair used. All started with a 30 min RT-step at 42°C followed 
by an RT-inactivation step at 95°C. Then a single step PCR for 
amplification of the genes from cDNA was conducted using 30 
cycles of 30 s denaturation at 95°C, 30 s annealing at 60° C, and 
90 s elongation step at 72°C before cooling down to 10° C. 

DNA BINDING ASSAYS 

The interaction of the purified recombinant C. aurantiacus RhaR 
protein with its cognate DNA binding site in C. aurantiacus was 
assessed using an electrophoretic mobility- shift assay (EMSA). 
The His6-Smt3-tag was cleaved from the purified RhaR protein 
by digestion with Ulpl protease. Complementary DNA frag- 
ments, containing the predicted 38-bp RhaR binding site from 
the Caur_2290 promoter region and flanked on each side by 
five guanosine residues (Table S2 in Supplementary Materials) 
were synthesized by Integrated DNA Technologies. One strand 
of oligo was 3'-labeled by a biotin label, whereas the comple- 
mentary oligo was unlabeled. Double-stranded labeled DNA frag- 
ments were obtained by annealing the labeled oligonucleotides 
with unlabeled complementary oligonucleotides at a 1:10 ratio. 
The biotin-labeled 48-bp DNA fragment (0.2 nM) was incu- 
bated with increasing concentrations of the purified RhaR pro- 
tein (10-1000 nM) in a total volume of 20|xl of the binding 
buffer containing 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 5 mM 
MgCl2, 1 mM DDT, 0.05% NP-40, and 2.5% glycerol. Poly(dI- 
dC) (1 |xg) was added as a nonspecific competitor DNA to reduce 
non-specific binding. After 25 min of incubation at 50°C, the 
reaction mixtures were separated by electrophoresis on a 1.5% 
(w/v) agarose gel at room temperature. The DNA was trans- 
ferred by electrophoresis onto a Hybond-N+ membrane and 
fixed by UV-cross-linking. The biotin-labeled DNA was detected 
with the LightShift chemQuminiscent EMSA kit (Thermo Fisher 



Scientific Inc, Rockford, IL, USA). Additional DNA fragment of 
the Caur_0003 gene upstream region (Table S2 in Supplementary 
Materials) was used as a negative control. The effect of D-glucose, 
L-Rha, and L-rhamnulose (obtained by enzymatic conversion of 
L-Rha by _Ec_RhaA) was tested by their addition to the incubation 
mixture. 

RESULTS 

COMPARATIVE GENOMICS OF L-RHAMNOSE UTILIZATION IN 
BACTERIA 

To reconstruct catabolic pathways and transcriptional regu- 
lons involved in L-Rha utilization in bacteria we utilized 
the subsystem-based comparative genomics approach imple- 
mented in the RegPredict and the SEED Web resources 
(Overbeek et al, 2005; Novichkov et al, 2010). As a result, 
the L-Rha metabolic pathway genes and transcriptional reg- 
ulons were identified in complete genomes of 55 repre- 
sentatives of diverse taxonomic groups of bacteria includ- 
ing the Actinomycetales, Bacteroiodales, Chloroflexales, Bacillales, 
Rhizobiales, Enterobacteriales, and Thermotogales. The distribu- 
tion of genes encoding the L-Rha catabolic enzymes and asso- 
ciated transporters and regulators across the studied species is 
summarized in Table SI in Supplementary Material. The stud- 
ied bacterial species possess many variations in key enzymes from 
the L-Rha catabolic pathway, as well as in mechanisms of sugar 
uptake and transcriptional regulation. Some of these variations 
are briefly described below when we describe novel functional 
variants of the L-Rha catabolic pathway and novel transcriptional 
regulons for these pathways. 

L-rhamnose catabolic regulons 

The transcriptional regulator RhaS in E. coli belongs to the AraC 
protein family and controls the L-Rha transporter rhaT and the 
catabolic operon rhaBADU (Egan and Schleif, 1993; Via et al., 
1996). Orthologs of rhaS and these catabolic genes for L-Rha 
utilization are present in other Enterobacteriales, as well as in 
Tolumonas and Mannheimia spp. RhaS in E. chrysanthemi was 
additionally shown to regulate the rhiTN operon involved in 
the uptake and catabolism of rhamnogalacturonides, L-rhamnose 
containing oligosaccharides (Hugouvieux-Cotte-Pattat, 2004). 
The analysis of upstream regions of RhaS-controUed genes and 
their orthologs in y-proteobacteria resulted in identification of 
the putative RhaS-binding motif, which was used for identi- 
fication of additional RhaS targets in the analyzed genomes 
(Figures 2B, 3B). 

Analysis of other taxonomic groups outside the y- 
proteobacteria identified previously uncharacterized members of 
the Lad, DeoR, and AraC families as alternative transcriptional 
regulators of the L-rhamnose catabolic pathways (Figure 2). To 
infer novel L-Rha regulons in each taxonomic group, we applied 
the comparative genomics approach that combines identification 
of candidate regulator-binding sites with cross-genomic com- 
parison of regulons. The upstream regions of L-Rha utilization 
genes in each group of genomes containing an orthologous TF 
was analyzed using a motif-recognition program to identify 
conserved TF-binding DNA motifs (Figure 3). The deduced 
palindromic DNA motifs of novel Lacl-family regulators are 
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FIGURE 2 I Genomic context of L-rhamnose catabolic genes and 
regulons in bacteria from seven diverse taxonomic lineages. Genes 
(sinown by rectangles) with the same functional roles are marked in 
matching colors. Genes encoding the novel bifunctional enzyme RhaEW 
are in parenthesis. Tentatively predicted functional roles are marked by 
asterisks. Transcriptional regulators are in black with the corresponding 
protein family name indicated by white text. Potential promoters are 



characteristic of DNA-binding sites of Lad family regulators. The 
predicted binding motifs of DeoR- family RhaR regulators in four 
distinct taxonomic groups are characterized by unique sequences; 
however, each of them has a similar structure that includes two 
imperfect direct repeats with a periodicity of 10-11 bp. Novel 
AraC-famUy regulators of L-Rha metabolism in the Bacillales, 
Bacteroides, and Enterococcus groups also are characterized 
by unique DNA motifs with a common structure of a direct 
repeat with 2 1 -bp periodicity. Among this large set of predicted 
L-Rha catabolic regulators, only two transcriptional factors, an 
AraC-type activator in B. thetaiataomicron and a DeoR-type 
repressor in R. leguminosarum, have been shown experimentally 
to mediate the transcriptional control of L-Rha utilization genes 
in the previous studies (Richardson et al., 2004; Patel et al, 2008), 
although specific DNA operator motifs of these two regulators 
were not reported before. 

A detailed description of the reconstructed L-Rha catabolic 
regulons is available in the RegPrecise database within the 



indicated by small arrows. Candidate regulator binding sites are shown by 
black circles with number corresponding to the DNA binding motifs in 
Figure 3, as well as by squares, trapezoids, stars, and triangles. Genomic 
locus tags of the first gene are indicated below each putative operon. 
Bacterial lineages: (A) Chloroflexales; (B) Enterobacteriales; (C) 
Actinomycetales; (D) Rhizobiales; (E) Bacillales; (F) Bacteroidales; (G) 
Lactobacillales; (H) Thermotogales. 



collection of regulons involved in L-Rha utilization (Novichkov 
et al., 2013). Overall, most of these TF regulons are local 
and control from one to several target operons per genome 
(Figure 2). In the Bacillales, RhaR and RhgR control genes 
involved in the utilization of L-rhamnose and rhamnogalactur- 
onan, respectively (Leyn et al., 2013). In the Thermotogales, the 
DeoR-family RhaR regulator co-regulates genes involved in the 
utilization of L-Rha mono- and oligosaccharides (Rodionov et al, 
2013). In the Rhizobiales, RhaR from the DeoR family nega- 
tively controls the L-Rha catabolic operon (Richardson et al., 
2004), whereas RhiR from the Lad family is predicted to reg- 
ulate the rhamnogalacturonide utilization gene cluster (named 
rhi). An orthologous Lad-family regulator controls the similar 
rhi gene locus in Erwinia spp. In the Actinomycetales, a novel 
Lacl-type regulator (termed RhaR) co-regulates genes involved 
in the uptake and catabolism of L-Rha and L-Rha-containing 
oligosaccharides. In the Chloroflexales, two unique TFs control L- 
Rha metabolism — the DeoR-family regulator RhaR controls the 
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FIGURE 3 I Consensus sequence logos for predicted DNA binding sites of in tlie RegPrecise database. Previously uncliaracterized regulators with 

transcriptional regulators of L-rhamnose catabolism in diverse bacterial tentatively predicted DNA motifs are marked by asterisks. Bacterial lineages: 

lineages. Sequence logos representing the consensus binding site motifs (A) Chloroflexales; (B) Enterobacteriales; [C] Actinomycetales; (D) Rhizobiales; 

were built using all candidate sites in each microbial lineage that are accessible (E) Bacillales; (F) Bacteroidales; (G) Lactobacillales; (H) Thermotogales. 



L-Rha utilization operons in both CMoroflexus and Roseiflexus 
spp., while the Lacl-family regulator RhmR controls the rhm 
operon involved in the L-Rha oligosaccharide utilization in 
C. aurantiacus. 

In summary, at least seven non-orthologous types of TFs 
appear to regulate the L-rhamnose utilization (rha) genes in 
diverse bacterial lineages. Uptake and catabolism of L-Rha- 
containing oligosaccharides is either co-regulated with rha 
genes by the same TFs (e.g., RhaRs in Actinomycetales and 
Thermotogales; RhaS in Enterobacteriales), or is under control of 



other specialized TFs (RhgR in Bacilales, RhiR in Rhizobiales, and 
Erwinia, RhmR in Chloroflexus). In the third part of this study, 
we experimentally validated the predicted DNA binding sites of 
RhaR regulator in C. aurantiacus. 

L-rhamnose catabolic pathways 

Analysis of L-Rha regulons revealed various sets of genes that 
are presumably involved in the L-rhamnose utilization subsys- 
tem (Table SI in Supplementary Material). By analyzing protein 
similarities and genomic contexts for these genes, we inferred 
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their potential functional roles and reconstructed the pathways 
(Figure 1). All four enzymatic steps of the reconstructed catabolic 
pathways occur in many alternative forms. The most con- 
served enzyme in the L-Rha subsystem is the L-rhamnulose 
kinase RhaB, which is substituted by a non-orthologous kinase 
from the same protein family in y-proteobacteria (Rodionova 
et al., 2012b). Two alternative types of L-rhamnulose isomerase 
(RhaA and Rhal) are almost equally distributed among the 
studied genomes. AH analyzed lineages except the Bacilalles 
possess L-rhamnulose isomerases of a single type. Among 
the Bacillales, all studied genomes have the RhaA isomerase, 
whereas only B. licheniformis has the non-orthologous isozyme 
RhaL 

The canonical form of L-rhamnulose- 1-P aldolase (RhaD) 
was found in y-proteobacteria, Bacteroidales, Thermotogales, 
and Lactobacillales. Instead of RhaD, the L-Rha catabolic gene 
clusters in Actinomycetales, a-proteobacteria, Bacillales, and 
Chloroflexales contain a chimeric gene encoding a two-domain 
protein (e.g., yuxG in Bacillus subtilis). The uncharacterized 
protein YuxG and its orthologs have an N-terminal class II 
aldolase domain (PF00596 protein family in PFAM) fused to 
a C-terminal short-chain dehydrogenase domain (PF00106). 
We used DELTA_BLAST to search for distant homologues of 
YuxG among proteins with experimentally determined func- 
tions. The N-terminal domain of YuxG (named RhaE) is 
distantly homologous to three E. colt enzymes, L-ribulose-5- 
phosphate epimerase (15% identity, ii-value le^^^), L-fuculose- 
1-phosphate aldolase (11% identity, £-value 4e^^^), and the 
canonical RhaD enzyme (14% identity, _E-value le^^^). These 
relationships suggest that it represents a non-orthologous sub- 
stitution of aldolase RhaD. The C-terminal domain of YuxG 
(named RhaW) is homologous to various NADH-and NADPH- 
dependent sugar dehydrogenases including sorbose dehydroge- 
nase from fungi (29% identity, ii-value 3e^'^), 2,3-butanediol 
dehydrogenase from Corynebacterium glutamicum (25% iden- 
tity, i:-value 4e^'''), and sorbitol-6-phosphate dehydrogenase 
from E. coli (22% identity, _E-value 9e^°^). The phyloge- 
netic occurrence profile suggests that RhaW may encode the 
missing L-lactaldehyde dehydrogenease/reductase. Thus, the 
bifunctional protein RhaEW is tentatively predicted to cat- 
alyze the two final reactions in the L-Rha catabolic pathway 
(Figure 1). 

Downstream enzymes for utilization of L-lactaldehyde var- 
ied the most among the analyzed species. Reconstruction of the 
RhaS regulon in y-proteobacteria identified various genes that 
are likely involved in utilization of L-lactaldehyde. The rham- 
nose operons in S. typhimurium and five other species include 
an additional gene (named rhaZ) that encodes a hypothetical 
iron-containing alcohol dehydrogenase (PF00465). E. carotovora 
has a single RhaS-regulated gene aldA encoding alcohol dehy- 
drogenase from another protein family (PF00171). In contrast, 
the RhaS regulons in E. chrysanthemi and Mannheimia spp. 
include the L-lactaldehyde reductase rhaO, whereas aldA and 
rhaZ are absent from their genomes. These observations sug- 
gest that y-proteobacteria use three different enzymes and two 
different pathways for the final stage of the L-rhamnose pathway 
(Figure 1). 



In summary, the subsystem reconstruction and genome con- 
text analyses allowed us to predict the following novel candidate 
genes: L-rhamnulose- 1-P aldolase (RhaE) and two variants of 
L-lactaldehyde utilizing enzymes (RhaW and RhaZ) in diverse 
bacterial genomes. In the second part of this study, we experimen- 
tally validated the predicted functions of RhaEW from B. subtilis 
andC. aurantiacus and RhaZ from S. typhimurium. 

L-rhamnose transporters and upstream hydrolytic pathways 

Uptake of L-Rha in E. coli is mediated by the L-Rha-proton 
symport protein, RhaT (Baldoma et al., 1990) that belongs to 
the Drug/Metabolite Transporter (DMT) superfamily. An orthol- 
ogous L-Rha transporter was found in the genome context 
of L-Rha utilization genes/regulons in other y-proteobacteria 
and in the Bacteroidales (Table SI in Supplementary Material). 
Another L-Rha transporter belonging to the ABC superfam- 
ily, RhaSTPQ (designated RhaFGHJ here) was described in 
_R. leguminosarum (Richardson et al., 2004). In this study, we 
identified orthologs within the L-Rha operons/regulons in all 
other a-proteobacteria, as well as in several genomes from 
the Chloroflexales, Actinomycetales, and Enterobacteriales orders. 
A different putative L-Rha transporter (termed RhaY), which 
belongs to the Sugar Porter (SP) family of the Major Facilitator 
Superfamily (MFS), was identified in certain Bacillales and 
Actinomycetales genomes. This functional assignment is sup- 
ported by the conserved co-localization on the chromosome (in 
Mycobacterium/ Nocardia spp.) and by predicted co-regulation 
(via upstream RhaR-binding site in Saccharopolyspora erythraea) 
with other rha genes. 

The predicted L-Rha regulons in many bacteria include sev- 
eral glycoside hydrolases and transport systems involved in the 
uptake of L-Rha-containing oligosaccharides in the cytoplasm 
and their consequent degradation to form L-Rha monosaccha- 
rides. The RhaS -activated operon rhiTN is involved in the uptake 
and hydrolysis of oligosaccharides produced during rhamno- 
galacturonan catabolism in the plant-pathogenic species from 
the order Enterobacteriales (Hugouvieux-Cotte-Pattat, 2004). 
Another enterobacterium, S. typhimurium, possesses a different 
RhaS-regulated transport system (named rhiABC), which is sim- 
ilar to the C4-dicarboxylate transport system Dcu (Figure 2). 
Based on the gene occurrence pattern and candidate co- 
regulation, rhiABC is tentatively predicted to encode an alter- 
native transporter for rhamnogalacturonides, which replaces 
RhiT in S. typhimurium. A different transport system from the 
ABC family (named rhiLFG) and putative a-L-rhamnosidases 
(ramA, rhmA) were detected within the RhaR regulons in several 
Actinomycetales. In the Bacillales and Rhizobiales groups, as well 
as in the Ewrinia and Chloroflexus spp., homologous ABC trans- 
porters and rhamnohydrolases are co-regulated with several novel 
lineage-specific transcriptional regulons, RhgR, RhiR, and RhmR, 
respectively. 

In summary, the comparative genomics analysis of L-Rha 
catabolic subsystem in bacteria revealed extensive variation for 
the components of transport machinery. L-Rha transport systems 
belong to at least three protein families. In addition to L-Rha 
transporters, many L-Rha-utUizing bacteria possess systems for 
active uptake of L-Rha containing oligosaccharides. 
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EXPERIMENTAL VALIDATION OF NOVEL RHAMNOSE CATABOLIC 
ENZYMES 

Novel aldolase/dehydrogenase RhaEW 

To provide biochemical evidence for the novel bifunctional 
aldolase/dehydrogenease enzyme involved in L-Rha catabolism, 
the recombinant protein RhaEW from C. aurantiacus (termed 
Cci_RhaEW) viras overexpressed in E. coli with the N-terminal 
His6 tag, purified using Ni-NTA affinity chromatography, and 
characterized in vitro by a coupled enzymatic assay using spec- 
trophotometry and GC-MS. 

Bioinformatics analysis suggested that RhaEW is a bifunc- 
tional enzyme catalyzing two sequential activities, L-rhamnulose- 
1-P aldolase and L-lactaldehyde dehydrogenase (Figure 1). We 
assayed the biochemical activity of the recombinant Cfl_RhaEW 
protein by monitoring the conversion of NAD+ to NADH at 
340 nm as a result of predicted L-lactaldehyde dehydrogenase 
reaction. The peak of Cfl_RhaEW activity ( Vmax 2.9 U mg Major 
Facilitator Superfamily^^ ) was observed at 60-70°C (Figure 4A), 
which is in agreement with the optimal temperature range for 
the growth for C. aurantiacus (Hanada and Pierson, 2006). 
Additionally, we tested the possibility that Cfl_RhaEW acts as 
an aldolase/reductase by supplying NADH rather than NAD+ 
in the reaction; no activity was seen under these conditions 
(data not shown). Thus, Cci_RhaEW acts in vitro to convert L- 
rhamnulose-l-P to L-lactate and DHAP, which is consistent with 
the prediction made through comparative genomics analyses. 

The formation of Cfl_RhaEW reaction products was directly 
confirmed by GC-MS profiling of reaction mixtures obtained by 
overnight incubation of L-Rha with the Cfl_RhaEW protein taken 
alone or in combination with the upstream catabolic enzymes. 
While incubation of L-Rha with Cfl_RhaEW alone did not pro- 
duce any new peaks on the chromatogram, the addition to the 
mixture of the £c_RhaA and rm_RhaB proteins led to a decrease 
of two peaks corresponding to L-Rha (retention times 9.28 and 
9.39 min) and the appearance of a series of novel peaks (Figure 
SI in Supplementary Material). By comparison with standards 
and the analysis of electron ionization mass spectra (m/z 299), 
the first two peaks with retention times 9.27 and 9.37 min were 



attributed to DHAP, whereas the peak at retention time 7.75 min 
was assigned as lactate. Additional peaks appearing in the coupled 
enzymatic assay were attributed to the upstream intermediates of 
the L-Rha catabolic pathway, L-rhamnulose (retention times 8.85 
and 8.92 min) and L-rhamnulose- 1-P (13.04 min). The moder- 
ate consumption of L-Rha observed when only iic_RhaA and 
rm_RhaB enzymes were added increased substantially after addi- 
tion of Cfl_RhaEW to the reaction mixture. Finally, neither DHAP 
nor lactate was detected in the reaction mixture after exclusion of 
NAD^" which is an essential cofactor of L-lactaldehyde dehydro- 
genase. These results suggest that the activity of the L-lactaldehyde 
dehydrogenase domain RhaW is essential for the L-rhamnulose- 
1-P aldolase activity of the second domain in this bifunctional 
enzyme. 

In order to test the hypothesis that RhaEW from B. sub- 
tilis functions in the catabolism of L-Rha in vivo, we performed 
growth experiments in defined medium for two mutant B. sub- 
tilis strains. One strain carried a knockout mutation in the gene 
yuxG (rhaEW), whereas the second strain carried an intact ver- 
sion of yuxG but had a knockout mutation in an unrelated gene, 
ycel (encoding a niacin transporter), to serve as an isogenic con- 
trol. We expected that the growth of the B. subtilis yuxG mutant 
strain would not be stimulated by the addition of L-Rha as a car- 
bon source when compared to the ycel mutant strain. The results 
clearly demonstrate that the B. subtilis yuxG knockout mutant 
is non-responsive to added L-Rha when compared to the ycel 
knockout strain and to both strains grown in the absence of an 
additional carbon source (Figure 4B). These data confirm that 
RhaEW is required for L-Rha utilization in B. subtilis. The B. 
subtilis RhaEW protein (i5s_RhaEW) was cloned, purified, and 
tested by the same coupled enzymatic assay as described above 
for Ca_RhaEW The 5s_RhaEW protein showed weak, but repro- 
ducible activity, measured at 0.0127 ± 0.001 jxmolmg protein^' 
min^' at 25°C. Controls removing starting substrate (L-Rha), 
Bs_RhaEW, or £(;_RhaB (effectively removing rhamnulose-l-P) 
from the reaction yielded no measurable activity (Figure S2A in 
Supplementary Materials). 

RhaZ functions as a L-lactaldehyde reductase in vitro 

We used the reconstituted L-Rha catabolic pathway to test the 
prediction that Salmonella spp. harbor a novel L-lactaldehyde 
dehydrogenase, distinct from that of E. coli and shared among 
a subgroup of the y-proteobacteria. We cloned, overexpressed 
and purified the recombinant proteins Sf_RhaD (predicted 
aldolase) and Sf_RhaZ (predicted novel dehydrogenase) from 
S. typhimurium to complete the in vitro pathway (Figure 1). 
Sf_RhaD is 99% identical at the amino acid level to E. coli RhaD, 
for which an aldolase function has been demonstrated (Schwartz 
et al., 1974). To ensure that Sf_RhaD acts as an aldolase in the 
L-Rha catabolism, we performed two control assays to confirm 
the production of DHAP and L-lactaldehyde. To test for the pro- 
duction of DHAP, we used purified glycerol-3-P dehydrogenase 
(GPDH) (Sigma) in an assay containing ££:_RhaA, rm_RhaB, 
and Sf_RhaD. If Sf_RhaD acts as a L-rhamnulose- 1-P aldolase, 
then the DHAP produced would be converted to glycerol-3-P 
by GPDH with the oxidation of NADH to NAD+ monitored as 
a decrease in absorbance at 340 nm. Likewise, it was expected 




FIGURE 4 I Biochemical and physiological characterization of novel 
aldolase/dehydrogenase RhaEW. (A) Temperature dependence of 
enzymatic activity of recombinant RhaEW protein from C. aurantiacus 
determined by a coupling colorimetric assay of the NAD-dependent RhaW 
dehydrogenase activity. (B) Growth studies of B. subtilis knocl<out mutants 
for yuxG {rhaEW) and ycel (niaP gene used as a control) grown in defined 
medium in the presence of L-rhamnose, D-glucose, and no additional 
carbon source (N.C.). Growth studies were conducted in triplicate. 
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that if St_RhaD produced L-lactaldehyde, then the known E. coli 
L-lactaldehyde dehydrogenase, AldA, should be active in a reac- 
tion containing all three L-Rha catabolic enzymes, producing 
L-lactate, and converting NAD+ to NADH. The results of both 
controls confirmed the activity of Sf_RhaD as a L-rhamnulose- 
1-P aldolase (data not shown), making possible to test the pre- 
diction for Sf_RhaZ. The purified Sf_RhaZ protein was included 
in an assay containing iic_RhaA, rm_RhaB, and Sf_RhaD, using 
NAD+ as a cofactor. This reaction mixture should lead to the con- 
version of L-lactaldehyde to L-lactate (as with the E. coli AldA 
enzyme). Under these conditions, St_RhaZ did not show activ- 
ity as a L-lactaldehyde dehydrogenase. In order to assess the 
alternative fate for L-lactaldehyde, which is conversion to L-1,2- 
propanediol, we repeated the assay under identical conditions 
with the exception of supplying NADH as the cofactor. Sf_RhaZ 
was active under these conditions (Figure 2B in Supplementary 
Materials), converting L-lactaldehyde to L- 1,2 -propanediol with 
a specific activity of 0.13 ± 0.02 (imolmg protein^ ^ min^^ This 
indicates that the function of RhaZ is a L-lactaldehyde reductase, 
rather than a L-lactaldehyde dehydrogenase. 

EXPERIMENTAL VALIDATION OF RHAMNOSE UTILIZATION AND 
REGULON IN CHLOROFLEXUS AURANTIACUS 

The anoxygenic phototroph C. aurantiacus can growheterotroph- 
ically using various organic compounds under either oxic condi- 
tions or anoxic conditions in light (Hanada and Pierson, 2006). 
However, the ability of C. aurantiacus and other species from 
the order Chloroflexales to utilize L-Rha has not been previously 
investigated. In C. aurantiacus, the L-Rha utilization genes are 
organized into a nine-gene rha operon, which is predicted to 
be transcriptionally controlled by a novel DeoR-family regulator 
RhaR (Figure 2). An additional gene, termed rhniA, encoding a 
potential a-L-rhamnosidase (Caur_0361) is potentially involved 
in the utilization of L-Rha oligosaccharides by C. aurantiacus. 
A novel Lacl-family transcription factor, termed RhmR, poten- 
tially regulates the RhmA-encoding operon, which also encodes 
a potential transport system for uptake of L-Rha-containing 
oligosaccharides, termed RhmEFG (Figure 2). In contrast to the 
L-Rha utilization operon, which has orthologs in all sequenced 
genomes of Chloroflexus and Roseiflexus spp., the rhniR/A gene 
locus is only conserved in the closely-related Chloroflexus spp. 
strain, Y-400-fl, but is absent in the other Chloroflexales. We 
assessed the L-Rha utilization and regulon in C. aurantiacus by 
a combination of in vivo and in vitro experimental approaches. 

To validate L-Rha-specific induction of the predicted L-Rha 
utilization genes in vivo, we performed RT-PCR with specific 
primers designed for three rha operon genes, rhaR, rhaF, and 
rhaB. Total RNA was isolated from C. aurantiacus grown in media 
containing YE or pyruvate, with and without addition of L- 
Rha. All three genes demonstrated elevated transcript levels in 
the cells grown on either YE or pyruvate media supplied with 
L-Rha compared to that of the cells grown in the absence of 
L-Rha (Figure S3 in Supplementary Materials). In addition to 
the rha operon genes, rhmA transcription was also highly ele- 
vated in pyruvate-grown cells supplied with L-Rha. These results 
confirm that the rha and rhm operons, that are predicted to be 
controlled by RhaR and RhmR transcription factors, respectively. 



are transcriptionally induced by L-Rha. Additionally, the L-Rha 
grown culture samples of C. aurantiacus were analyzed by HPLC 
to monitor the L-Rha consumption from the culture fluids. The 
results confirm a high rate of L-Rha consumption in the samples 
(Figure S3 in Supplementary Materials), thus confirming that the 
L-Rha uptake and utilization system is functional in vivo. 

The interaction of the predicted RhaR regulator with the 
Caur_2209 (rhaR) upstream DNA fragment containing candi- 
date RhaR-binding sites in C. aurantiacus, and the influence 
of potential sugar effectors on protein-DNA interaction were 
assessed in vitro by EMSA (Figure 5). The synthetic 38-bp DNA 
region containing a tandem repeat of four individual RhaR sites 
(a consensus sequence TCGAAA) was incubated with increasing 
concentrations of the purified recombinant RhaR protein. The 
incubation was performed at 50°C, which is close to the opti- 
mal growth temperature of 55° C for C. aurantiacus. The EMSA 
results (Figure S4 in Supplementary Material) are consistent with 
the in silico predicted DNA operator region of RhaR. The addition 
of D-glucose and L-Rha had no effect on RhaR- DNA interaction, 
whereas L-rhamnulose abolished the specific DNA-binding abil- 
ity of RhaR. The obtained results suggest that the RhaR repressor 
binds to the operator region at the rha operon in the absence 
of a sugar inducer, and that L-rhamnulose serves as a negative 
regulator for RhaR in C. aurantiacus. 

DISCUSSION 

L-Rha is the most common deoxy-hexose sugar in nature. In 
plants, it is a component of many glycosides and polysaccha- 
rides such as pectins and hemicelluloses (Peng et al., 2012). 
Among bacteria, L-Rha is found in the cell wall and as a 
part of the glycosylated carotenoids (Takaichi and Mochimaru, 
2007; Takaichi et al., 2010). Utilization of L-Rha and rhamnose- 
containing polysaccharides has previously been studied in several 
free-living and plant pathogenic microbial species from the phy- 
lum Proteobacteria, including members of the genera Escherichia, 
Erwinia, Rhizobium, Azotobacter, and Sphingomonas. Due to sig- 
nificant variations in sugar catabolic pathways in bacteria, the 
projection of this knowledge to the genomes of more distant 
species, including many species important for prospective bioen- 
ergy applications, is a challenging problem (Rodionov et al., 2010, 
2013). In this study, we used comparative genomics to reconstruct 
novel variants of catabolic pathways and novel transcriptional 
regulons for L-Rha utilization in the genomes of bacteria from 
ten taxonomic groups. 

Using bioinformatics analyses of L-Rha utilization genes, we 
identified twelve groups of rhamnose-related transcriptional reg- 
ulators from different protein families, AraC, DeoR, and Lad, 
and proposed binding site motifs for these regulators within 
tentatively reconstructed regulons (Figure S5 in Supplementary 
Material). Prior to this study, only four types of bacterial tran- 
scriptional regulators related to L-Rha metabolism had been 
identified. The AraC family includes at least five groups of 
non-orthologous regulators of L-Rha metabolism. These regula- 
tors have unique DNA motifs with a tandem repeat symmetry. 
Activators from three AraC groups have been characterized pre- 
viously: RhaR and RhaS from E. coli and Erwinia spp., with 
previously known DNA motifs, and RhaR from Bacteroides, with 
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Caur 
Chy400 
Cagg 
Rcas 

RS-1 



TAAAT rCGTAA 4ATC ICGTflfl STAC TTGACfl RATC TTGAAA ATCACTATAGA 
TAAAT rCGTAA \ATC ICGTAA STAC ITGACA AATC TTGAAA ATCACTATAGA 
TCAAT rCGTAA 4ATA ICGAAA MAC TTGACA GATT TTGAAA ATTACTCTAGA 

CATAArCGAAA\AAC:CGAAAiCACTTGACA!iAASTCGAAAACCACACTATA--CTGCCTCAACTT 
AGAAA rCGAAA ::ATC rCGAAA 3CAC TTGACA ACAC TCGAAA ATCACGTTACACTCTAGCGCAACCG 



CTATGGAAACCA TTCCCGATAA 22 nt 

CTATGGAAACCA TTCCCGATAA 22 nt 

CTATGGAAACAATAATTCTCTGATAT 41 nt 
-TTCCGCATAC 21 bp 
CTTCCGCACAA 22 nt 



* *** ** * 


** ** ** 


****** * ***** 


** ** * ** 


*** * * * * 




RhaR 


RhaR + D-glucose 


RhaR + L-Rha 


RhaR + L-rhamnulose 


Caur_2209 (rhaR) 


+ 


+ 


+ 




Caur_0003(H.C.) 




N.D. 


N.D. 


N.D. 



FIGURE 5 I Experimental validation of RhaR regulon in C. aurantiacus. 

(A) Conservation of predicted RliaR binding sites (boxed) identified in the 
promoter regions of rha operons in the C. aurantiacus J-W-il (Caur), C. sp. 
Y-400-fl (ChyAOO), C. aggregans DSM 9485 (Cagg), Roseiflexus sp. RS-1 
(RS-1), and R. castenholzii DSM 13941 (Rcas). Distance to a start codon of 
rhaR is indicated. A 38-bp fragment from C. aurantiacus used for DNA binding 
assays is underlined. (B) Summary of the EMSA experiments assessing the 
potential interaction between the recombinant RhaR protein and its predicted 
DNA motif at the Caur_2209 {rhaR) gene. The disappearance of unbound 



DNA band (shown by "+"] was observed upon the addition of increasing 
concentrations of RhaR protein (0.25-1 \lM). Addition of 2 mWI of L-rhamnose 
or D-glucose to the reaction mixture containing 1 \lM of RhaR did not change 
this pattern, whereas addition of 2mM of L-rhamnulose led to re-appearance 
of the unbound DNA band (shown by "-"). As a negative control, incubation 
of RhaR protein (0.5 \x,M) with upstream DNA fragment of Caur_0003 did not 
reveal the disappearance of unbound DNA band (shown by "-"). The EMSA 
gel pictures are presented in Figure S4 in Supplementary Material. Asterisks 
indicate the conserved nucleotides in the multiple alignment. 



previously unknown DNA motif. The DeoR family includes at 
least four non-orthologous groups of RhaR regulators that are 
characterized by distinct DNA motifs with a tandem repeat sym- 
metry. Among them, only RhaR in Rhizobium spp. was described 
previously (Richardson et al, 2004); however, its DNA binding 
motif was not known before this study. All Lacl-family regu- 
lons of L-Rha utilization genes were analyzed for the first time 
in this study. They are characterized by 20-bp palindromic DNA 
motifs of four different consensus sequences. In summary, the 
results of this comparative genomics study demonstrate signifi- 
cant variability in the design and composition of transcriptional 
regulons for L-Rha metabolism in bacteria. This study has very 
significantly increased our knowledge about types and operator 
sequences for transcriptional regulators for L-Rha utilization. 

Based on genomic context analyses of the reconstructed reg- 
ulons, we have identified several novel enzymes and transporters 
involved in L-Rha utilization (Figure 1 ) . A novel enzyme with two 
domains, termed RhaEW, encoded by the yuxG gene in B. subtilis 
and its orthologs in other bacterial lineages, was found to catalyze 
the last two steps in the catabolism of L-Rha, namely cleav- 
age of L-rhamnulose- 1-P to produce DHAP and L-lactaldehyde 
and oxidation of L-lactaldehyde to L-lactate. Thus, the RhaE 
domain functions as a non-orthologous substitute for the clas- 
sical RhaD aldolase, whereas the function of the RhaW domain 
is analogous to the aldehyde dehydrogenase AldA from E. coli. 
A novel L-lactaldehyde reductase involved in L-Rha catabolism, 
termed RhaZ, that is not homologous to previously character- 
ized RhaO/FucO, was identified in many y-proteobacteria. Both 
functional predictions were experimentally validated in vitro by 
enzymatic assays with the purified recombinant proteins from C. 
aurantiacus and B. subtilis (for RhaEW), and S. typhimurium (for 
RhaZ). The function of RhaEW in L-Rha utilization in vivo was 
also confirmed by genetic techniques in B. subtilis. Interestingly, 
genes encoding L-lactate dehydrogenases {lldD, lldEFG) belong 
to the reconstructed RhaR regulons in certain genomes of the 
Actinomycetales and Rhodobacterales that encode RhaEW. Thus, 



the L-Rha utilization pathways in these species are probably 
extended to produce pyruvate as one of the final products. 

Orthologs of the novel aldolase/dehydrogenase RhaEW are 
broadly distributed among diverse bacterial phyla including 
Proteobacteria (a-subdivision), Actinobacteria, Chloroflexi, 
Bacteroidetes, and Firmicutes (Bacillales), in which they 
are always encoded within the rha gene loci (Figure S6 in 
Supplementary Material). The L-rhamnulose- 1-P aldolase 
domain in RhaE is distantly homologous to class II aldolases 
including the analogs enzyme, RhaD, and the L-fuculose-l-P 
aldolase, FucA, from E. coli. The tertiary structures and catalytic 
mechanisms for these enzymes have been determined (Dreyer 
and Schuiz, 1996; Grueninger and Schulz, 2008). We aligned the 
amino acid sequences of all three enzymes using the multiple 
protein sequence and structure alignment server PROMALS3D 
(Pei et al., 2008) (Figure S7 in Supplementary Material). Class 
II aldolases are zinc-dependent enzymes, in which the metal 
ion is used for enolate stabilization during catalysis. In RhaD, 
the Zn^+ion is chelated by three histidines, His^'*^, His'*^, and 
His^'^, which are conserved in all RhaE proteins. An Asp residue 
in RhaE replaces the catalytically important Glu'^'' in RhaD, 
which performs the nucleophUic attack of the C3 atom of DHAP. 
This conservative substitution suggests that this Asp may play 
the similar role in RhaE. The Gly^^, Asn^', and Gly^"* residues 
that are involved in phosphate binding in FucA (Dreyer and 
Schulz, 1996) are conserved in both RhaD and RhaE enzymes. 
Conservation of the catalytically important amino acids in both 
types of L-rhamnulose- 1-P aldolases suggests similar position of 
the active site and catalytic mechanism. 

In summary, the phosphorylated catabolic pathway for L- 
Rha contains a large number of alternative enzymes includ- 
ing Rhal/RhaA, RhaB/RhaK, RhaD/RhaE, RhaO/RhaZ, and 
RhaW/ AldA (Figure 1) and is widely-distributed among diverse 
bacterial phyla. An alternative pathway for the nonphosphory- 
lated L-Rha catabolism that utilizes a unique subset of catabolic 
enzymes was found only in a small number of proteobacteria 
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(Table SI in Supplementary Material). In addition to numerous 
variations among enzymes and transcriptional regulators asso- 
ciated with the L-Rha catabolic pathway, a similarly high level 
of variations and non-orthologous displacements is observed for 
the components of transport machinery. The L-Rha permease, 
RhaT, which is characteristic of members of the Enterobacteriales 
and Bacteroidales, appears to be functionally replaced by either 
a permease from a different family in some Actinomycetales 
and Bacillales or an ABC cassette in a-proteobacteria and 
Chloroflexales. In other genomes, no candidate transporter spe- 
cific for L-Rha was detected; however, the reconstructed L-Rha 
pathways and regulons in these species include transport sys- 
tems and hydrolytic enzymes for L-Rha oligosaccharides (e.g., 
rhamnogalacturonides). Some of the latter species are known to 
grow on L-Rha, such as B. subtilis (this study) and T. maritima 
(Rodionov et al., 2013), thus we propose that the predicted L-Rha 
oligosaccharide transporters in these species are also capable of 
L-Rha uptake. 

Previous studies of L-Rha catabolism in E. coli and Salmonella, 
revealed a differential fate for L-Rha under aerobic and anaerobic 
conditions in E. coli, but not in Salmonella (Baldoma et al., 1988; 
Obradors et al., 1988). E. coli oxidizes L-lactaldehyde to L-lactate 
via the activity of AldA under aerobic conditions and reduces 
L-lactaldehyde to L- 1,2 -propanediol via the activity of FucO 
under anaerobic conditions (Figure 1). In contrast. Salmonella 
produces L-l,2-propanediol under both aerobic and anaero- 
bic conditions when metabolizing L-Rha. The identification of 
Salmonella RhaZ as an L-lactaldehyde reductase is consistent with 
these observations. Salmonella produce 1:1 molar equivalents of 
L-l,2-propanediol from the catabolism of L-Rha under both aer- 
obic and anaerobic conditions, with growth yields higher than 
E. coli under anaerobic conditions (Baldoma et al., 1988). The 
production of L-l,2-propanediol through renewable, biological 
methods is of high importance given the current chemical based 
processes of production and the high use of L-l,2-propanediol 
in many commercial products (Cameron et al., 1998). There are 
several examples of recent bioengineering strategies to improve L- 
1,2-propanediol production in E. coli (Clomburg and Gonzalez, 
2011), cyanobacteria (Li and Liao, 2013), and Saccharomyces 
(Jung et al, 20 1 1 ) in which each strategy uses glycerol as a starting 
substrate. The observation of differential fates for L-Rha in E. coli 
and Salmonella, the identification of the activity of RhaZ, putative 
transport systems for rhamnogalacturonides, and predicted regu- 
latory mechanisms in Salmonella raise possibilities for exploring 
alternative biological production strategies of the commercially 
important L-l,2-propanediol from L-Rha containing substrates, 
though L-Rha, itself, remains an expensive substrate (Cameron 
etal, 1998). 

C. aurantiacus and other filamentous anoxygenic phototrophic 
bacteria from the Chloroflexaceae family were commonly found 
in the upper layers of microbial mats in hot springs (50-62°C), 
with cyanobacteria growing together with chloroflexi. Although 
Chloroflexus spp. can grow heterotrophically on various organic 
carbon sources, their sugar utilization pathways have remained 
largely unknown before this work. Here, we identified and char- 
acterized a novel variant of the L-Rha catabolic pathway in C. 
aurantiacus, which includes the L-Rha isomerase RhaA, kinase 



RhaB, and a novel bifunctional enzyme, RhaEW, that catalyzes 
the last two steps of the pathway. C. aurantiacus transcribed genes 
for L-Rha utilization when L-Rha was present in the growth 
medium and consumed L-Rha from the medium. The ecophysi- 
ological importance of the L-Rha utilization pathway in members 
of the Chloroflexales is yet to be elucidated. One possibility is 
that cyanobacteria commonly co-occurring with chloroflexi in 
hot springs microbial mats may provide them L-Rha. In such 
microbial mats, cyanobacteria are primary producers that are 
thought to cross-feed low-molecular-weight organic compounds 
(e.g., lactate, acetate, glycolate) to members of the Chloroflexales 
(van der Meer et al, 2003, 2007). There are several potential 
sources of L-Rha in cyanobacteria including lipopolysaccharides 
in the outer membrane (Buttke and Ingram, 1975) and glycosy- 
lated carotenoids in the cytoplasmic and outer membrane that 
protect the cell against photooxidative damage (Takaichi and 
Mochimaru, 2007; Graham and Bryant, 2009). The exact source 
of L-Rha from a primary producer and its significance for possi- 
ble metabolite exchange in the mat community requires further 
investigation. 
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